Skip to content

Conversation

@virajmehta
Copy link
Member

@virajmehta virajmehta commented Jan 29, 2026

We were not failing tasks when their lease had expired at the time they are writing checkpoints. We now throw a control error when a task attempts to write a checkpoint with an expired lease.


Note

Medium Risk
Changes core Postgres lease/checkpoint functions and worker control-flow handling; mistakes could cause unexpected task termination or lock contention under concurrency.

Overview
Checkpoints/heartbeats now enforce active leases. durable.set_task_checkpoint_state and durable.extend_claim were updated to lock in task -> run order and to validate that the run is still running with an unexpired claim_expires_at; if not, they raise SQLSTATE AB002 (added via a new migration).

Runtime behavior changes. Rust adds ControlFlow::LeaseExpired (mapped from AB002) and the worker treats it as non-fatal control flow (logs and stops without double-failing the run). Tests add SleepThenCheckpointTask plus new lease-expiry cases asserting no checkpoints are written after lease expiry; benches add hierarchical parent/child/grandchild tasks for stress testing.

Written by Cursor Bugbot for commit 1f82eb7. This will update automatically on new commits. Configure here.

@virajmehta virajmehta force-pushed the viraj/fail-on-lease-expiration branch from e315639 to 7d82aa4 Compare January 30, 2026 21:10
@virajmehta virajmehta enabled auto-merge January 30, 2026 22:12
@virajmehta virajmehta added this pull request to the merge queue Jan 30, 2026
Merged via the queue into main with commit 5e0e786 Jan 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants